An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit
نویسندگان
چکیده
This paper presents a novel parallelization and quantitative characterization of various optimization strategies for dataparallel computation on a graphics processing unit (GPU) using NVIDIA’s new GPU programming framework, Compute Unified Device Architecture (CUDA). CUDA is an easy-to-use development framework that has drawn the attention of many different application areas looking for dramatic speed-ups in their code. However, the performance tradeoffs in CUDA are not yet fully understood, especially for data-parallel applications. Consequently, we study two fundamental mathematical operations that are common in many data-parallel applications: convolution and accumulation. Specifically, we profile and optimize the performance of these operations on a 128-core NVIDIA GPU. We then characterize the impact of these operations on a video-based motion-tracking algorithm called vector coherence mapping, which consists of a series of convolutions and dynamically weighted accumulations, and present a comparison of different implementations and their respective performance profiles.
منابع مشابه
Parallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform
There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...
متن کاملAccelerating Signal Processing Algorithms Using Graphics Processors
There is increased interest in the use of graphics processing units (GPUs) for general purpose computation. This is because GPUs are almost two orders of magnitude faster in terms of floating point throughput compared to conventional CPUs. In this paper we investigate the use of graphics processing units for accelerating signal processing algorithms, specifically FIR filters and the FFT. We des...
متن کاملAchieving Fast Computer-Generated Hologram Calculations via Parallelization
Computer-Generated Holography (CGH) plays an important role in the development of three-dimensional display. However, the enormous computational time for CGH generation hinders the practicality of the CGH—depending on a specific application, it takes several hours, even more to complete CGH computations. To enhance the hologram computation speed, we present a parallel computing approach to acce...
متن کاملLHCb GPU acceleration project
The LHCb detector is due to be upgraded for processing high-luminosity collisions, which will increase data bandwidth to the event filter farm from 100GB/s to 4 TB/s, encouraging us to look for new ways of accelerating Online reconstruction. The Coprocessor Manager is a new framework for integrating LHCb’s existing computation pipelines with massively parallel algorithms running on GPUs and oth...
متن کاملSolving Quadratic Programming Problems on Graphics Processing Unit
Quadratic Programming (QP) problems frequently appear as core component when solving constrained optimal control or estimation problems. The focus of this paper is on accelerating an existing Interior Point Method (IPM) for solving QP problems by exploiting the parallel computing characteristics of GPU. We compare the so-called data-parallel and the problem-parallel approaches to achieve speed ...
متن کامل